Simple topological properties predict functional misannotations in a metabolic network
نویسندگان
چکیده
MOTIVATION Misannotation in sequence databases is an important obstacle for automated tools for gene function annotation, which rely extensively on comparison with sequences with known function. To improve current annotations and prevent future propagation of errors, sequence-independent tools are, therefore, needed to assist in the identification of misannotated gene products. In the case of enzymatic functions, each functional assignment implies the existence of a reaction within the organism's metabolic network; a first approximation to a genome-scale metabolic model can be obtained directly from an automated genome annotation. Any obvious problems in the network, such as dead end or disconnected reactions, can, therefore, be strong indications of misannotation. RESULTS We demonstrate that a machine-learning approach using only network topological features can successfully predict the validity of enzyme annotations. The predictions are tested at three different levels. A random forest using topological features of the metabolic network and trained on curated sets of correct and incorrect enzyme assignments was found to have an accuracy of up to 86% in 5-fold cross-validation experiments. Further cross-validation against unseen enzyme superfamilies indicates that this classifier can successfully extrapolate beyond the classes of enzyme present in the training data. The random forest model was applied to several automated genome annotations, achieving an accuracy of ~60% in most cases when validated against recent genome-scale metabolic models. We also observe that when applied to draft metabolic networks for multiple species, a clear negative correlation is observed between predicted annotation quality and phylogenetic distance to the major model organism for biochemistry (Escherichia coli for prokaryotes and Homo sapiens for eukaryotes). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Automatic policing of biochemical annotations using genomic correlations
With the increasing role of computational tools in the analysis of sequenced genomes, there is an urgent need to maintain high accuracy of functional annotations. Misannotations can be easily generated and propagated through databases by functional transfer based on sequence homology. We developed and optimized an automatic policing method to detect biochemical misannotations using context geno...
متن کاملCo-Regulation of Metabolic Genes Is Better Explained by Flux Coupling Than by Network Distance
To what extent can modes of gene regulation be explained by systems-level properties of metabolic networks? Prior studies on co-regulation of metabolic genes have mainly focused on graph-theoretical features of metabolic networks and demonstrated a decreasing level of co-expression with increasing network distance, a naïve, but widely used, topological index. Others have suggested that static g...
متن کاملQuantitative Structure-Property Relationship to Predict Quantum Properties of Monocarboxylic Acids By using Topological Indices
Abstract. Topological indices are the numerical value associated with chemical constitution purporting for correlation of chemical structure with various physical properties, chemical reactivity or biological activity. Graph theory is a delightful playground for the exploration of proof techniques in Discrete Mathematics and its results have applications in many areas of sciences. A graph is a ...
متن کاملUsing the topology of metabolic networks to predict viability of mutant strains.
Understanding the relationships between the structure (topology) and function of biological networks is a central question of systems biology. The idea that topology is a major determinant of systems function has become an attractive and highly disputed hypothesis. Although structural analysis of interaction networks demonstrates a correlation between the topological properties of a node (prote...
متن کاملConstraint-based functional similarity of metabolic genes: going beyond network topology
MOTIVATION Several recent studies attempted to establish measures for the similarity between genes that are based on the topological properties of metabolic networks. However, these approaches offer only a static description of the properties of interest and offer moderate (albeit significant) correlations with pertinent experimental data. RESULTS Using a constraint-based large-scale metaboli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 29 شماره
صفحات -
تاریخ انتشار 2013